Histopathologic Cancer Detection

Dataset Description

In this dataset, you are provided with a large number of small pathology images to classify. Files are named with an image id. The train_labels.csv file provides the ground truth for the images in the train folder. You are predicting the labels for the images in the test folder. A positive label indicates that the center 32x32px region of a patch contains at least one pixel of tumor tissue. Tumor tissue in the outer region of the patch does not influence the label. This outer region is provided to enable fully-convolutional models that do not use zero-padding, to ensure consistent behavior when applied to a whole-slide image.

The original PCam dataset contains duplicate images due to its probabilistic sampling, however, the version presented on Kaggle does not contain duplicates. We have otherwise maintained the same data and splits as the PCam benchmark.

Dataset Distribution

We take a sample of a dataset where all labels have a similar number of instances.

Note that, now,

Preprocessing

We can convert the TIFF images to JPEG files and copy them into a new directory.

Now,

Modeling: Keras Multi-layer Perceptron (MLP) for Image Classifications

A multi-layer perceptron (MLP) is a class of feedforward artificial neural network (ANN). The algorithm at each iteration uses the Cross-Entropy Loss to measure the loss, and then the gradient and the model update is calculated. At the end of this iterative process, we would reach a better level of agreement between test and predicted sets since the error would be lower from that of the first step.

Here, we have a small dataset that might result in Overfitting. Thus, we can define a Data augmentation function that generates additional training data from the existing examples by augmenting them using random transformations that yield believable-looking images.

Compiling and fitting the model

Here, we only went through a few iterations; however, we need to train the model for more iterations to get more accurate results.


References

  1. Kaggle Dataset: Histopathologic Cancer Detection
  2. B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling. "Rotation Equivariant CNNs for Digital Pathology". arXiv:1806.03962

  3. Ehteshami Bejnordi et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA: The Journal of the American Medical Association, 318(22), 2199–2210. doi:jama.2017.14585